Goto

Collaborating Authors

 truncated linear regression


Review for NeurIPS paper: Truncated Linear Regression in High Dimensions

Neural Information Processing Systems

Weaknesses: - My major concern is why the problem is difficult. Assumption 1 literally enforces that the adversary cannot pick arbitrary S, but only those such that a constant alpha-fraction of the observations are hidden/removed. Thus, suppose before removal we have a total of m samples (a, y). After removal it reduces to alpha * m pairs (a, y), which still suffices for accurate recovery provided that m O(k log n). - It is not convincing to me that the sample complexity in Theorem 3.1 is near-optimal. I know that O(k log n) is near-optimal, but does your result really imply such bound?


Review for NeurIPS paper: Truncated Linear Regression in High Dimensions

Neural Information Processing Systems

The truncated setting is interesting both in practice and theoretically. The reviewers uniformly felt that this is an interesting paper, and a good contribution to the community.


Truncated Linear Regression in High Dimensions

Neural Information Processing Systems

As in standard linear regression, in truncated linear regression, we are given access to observations (Ai, yi)i whose dependent variable equals yi Ai {\rm T} \cdot x * \etai, where x * is some fixed unknown vector of interest and \etai is independent noise; except we are only given an observation if its dependent variable yi lies in some "truncation set" S \subset \mathbb{R}. The goal is to recover x * under some favorable conditions on the Ai's and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering k-sparse n-dimensional vectors x * from m truncated samples, which attains an optimal \ell2 reconstruction error of O(\sqrt{(k \log n)/m}). As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, such as that which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaption of the LASSO optimization problem that accommodates truncation.